Goto

Collaborating Authors

 Nancy


Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems

arXiv.org Artificial Intelligence

Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems Ajinkya Kulkarni 1, 2, Atharva Kulkarni 3, Miguel Couceiro 4, 5, Isabel Trancoso 5 1 IDIAP, Switzerland, 2 MBZUAI, UAE, 3 Erisha Labs, India 4 Universit e de Lorraine, CNRS, LORIA, Nancy, France 5 INESC-ID, IST, Universidade de Lisboa, Portugal ajinkya.kulkarni@idiap.ch Abstract In this paper, we present a bias and sustainability focused investigation of Automatic Speech Recognition (ASR) systems, namely Whisper and Massively Multilingual Speech (MMS), which have achieved state-of-the-art (SOT A) performances. Despite their improved performance in controlled settings, there remains a critical gap in understanding their efficacy and equity in real-world scenarios. In addition, we examine the environmental impact of ASR systems, scrutinizing the use of large acoustic models on carbon emission and energy consumption. We also provide insights into our empirical analyses, offering a valuable contribution to the claims surrounding bias and sustainability in ASR systems. Index T erms: ASR, Bias, carbon footprint, sustainability 1. Introduction The advent of large deep neural networks (DNNs) has brought about substantial advancements in various speech-processing applications, notably in speech recognition.


Inverse Reinforcement Learning through Structured Classification Supélec - IMS-MaLIS Research Group Nancy, France

Neural Information Processing Systems

This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclass classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.


Parametric-Task MAP-Elites

arXiv.org Artificial Intelligence

Optimizing a set of functions simultaneously by leveraging their similarity is called multi-task optimization. Current black-box multi-task algorithms only solve a finite set of tasks, even when the tasks originate from a continuous space. In this paper, we introduce Parametric-task MAP-Elites (PT-ME), a novel black-box algorithm to solve continuous multi-task optimization problems. This algorithm (1) solves a new task at each iteration, effectively covering the continuous space, and (2) exploits a new variation operator based on local linear regression. The resulting dataset of solutions makes it possible to create a function that maps any task parameter to its optimal solution. We show on two parametric-task toy problems and a more realistic and challenging robotic problem in simulation that PT-ME outperforms all baselines, including the deep reinforcement learning algorithm PPO.


Cross Domain Early Crop Mapping using CropGAN and CNN Classifier

arXiv.org Artificial Intelligence

Driven by abundant satellite imagery, machine learning-based approaches have recently been promoted to generate high-resolution crop cultivation maps to support many agricultural applications. One of the major challenges faced by these approaches is the limited availability of ground truth labels. In the absence of ground truth, existing work usually adopts the "direct transfer strategy" that trains a classifier using historical labels collected from other regions and then applies the trained model to the target region. Unfortunately, the spectral features of crops exhibit inter-region and inter-annual variability due to changes in soil composition, climate conditions, and crop progress, the resultant models perform poorly on new and unseen regions or years. This paper presents the Crop Generative Adversarial Network (CropGAN) to address the above cross-domain issue. Our approach does not need labels from the target domain. Instead, it learns a mapping function to transform the spectral features of the target domain to the source domain (with labels) while preserving their local structure. The classifier trained by the source domain data can be directly applied to the transformed data to produce high-accuracy early crop maps of the target domain. Comprehensive experiments across various regions and years demonstrate the benefits and effectiveness of the proposed approach. Compared with the widely adopted direct transfer strategy, the F1 score after applying the proposed CropGAN is improved by 13.13% - 50.98%


Stability of Q-Learning Through Design and Optimism

arXiv.org Artificial Intelligence

Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. The purpose of this paper is in part a tutorial on stochastic approximation and Q-learning, providing details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms, and stochastic approximation in other settings. Two contributions are entirely new: 1. Stability of Q-learning with linear function approximation has been an open topic for research for over three decades. It is shown that with appropriate optimistic training in the form of a modified Gibbs policy, there exists a solution to the projected Bellman equation, and the algorithm is stable (in terms of bounded parameter estimates). Convergence remains one of many open topics for research. 2. The new Zap Zero algorithm is designed to approximate the Newton-Raphson flow without matrix inversion. It is stable and convergent under mild assumptions on the mean flow vector field for the algorithm, and compatible statistical assumption on an underlying Markov chain. The algorithm is a general approach to stochastic approximation which in particular applies to Q-learning with "oblivious" training even with non-linear function approximation.


Learnable Nonlinear Compression for Robust Speaker Verification

arXiv.org Artificial Intelligence

In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and VoxMovies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%.


Optimizing Multi-Taper Features for Deep Speaker Verification

arXiv.org Artificial Intelligence

Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness.


Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

arXiv.org Artificial Intelligence

After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification. However, as a feature extractor with long-term operations on the power spectrogram, its temporal processing and amplitude scaling steps dedicated on environmental compensation may be redundant. Further, they might suppress intrinsic speaker variations that are useful for speaker verification based on deep neural networks (DNN). Therefore, in this study, we revisit and optimize PNCCs by ablating its medium-time processor and by introducing channel energy normalization. Experimental results with a DNN-based speaker verification system indicate substantial improvement over baseline PNCCs on both in-domain and cross-domain scenarios, reflected by relatively 5.8% and 61.2% maximum lower equal error rate on VoxCeleb1 and VoxMovies, respectively.



Using exoskeletons to assist medical staff during prone positioning of mechanically ventilated COVID-19 patients: a pilot study

arXiv.org Artificial Intelligence

We conducted a pilot study to evaluate the potential and feasibility of back-support exoskeletons to help the caregivers in the Intensive Care Unit (ICU) of the University Hospital of Nancy (France) executing Prone Positioning (PP) maneuvers on patients suffering from severe COVID-19-related Acute Respiratory Distress Syndrome. After comparing four commercial exoskeletons, the Laevo passive exoskeleton was selected and used in the ICU in April 2020. The first volunteers using the Laevo reported very positive feedback and reduction of effort, confirmed by EMG and ECG analysis. Laevo has been since used to physically assist during PP in the ICU of the Hospital of Nancy, following the recrudescence of COVID-19, with an overall positive feedback.